Evaluating Stream Filtering for Entity Profile Updates in TREC 2012, 2013, and 2014
نویسندگان
چکیده
The Knowledge Base Acceleration (KBA) track ran in TREC 2012, 2013, and 2014 as an entitycentric filtering evaluation. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. Compared with the 2012 and 2013 evaluations, the 2014 evaluation introduced several refinements, including high-quality community metadata from running Raytheon/BBN’s Serif named entity recognizer, sentence parser, and relation extractor on 579,838,246 English documents in the corpus. We also expanded the query entities to be primarily long-tail entities that lacked Wikipedia profiles. We simplified the SSF scoring, and also added a third task component for highlighting creative systems that used the KBA data. A successful KBA system must do more than resolve the meaning of entity mentions by linking documents to the KB: it must also distinguish novel “vitally” relevant documents and slot fills that would change a target entity’s profile. This combines thinking from natural language understanding (NLU) and information retrieval (IR). Filtering tracks in TREC have typically used queries based on topics described by a set of keyword queries or short descriptions, and annotators have generated relevance judgments based on their personal interpretation of the topic. For TREC 2014, we selected a set of filter topics based on people, organizations, and facilities in the region between Seattle, Washington, and Vancouver, British Columbia: 86 people, 16 organizations, and 7 facilities. Assessors judged ~30k documents, which included most documents that mention a name from a handcrafted list of surface form names of the 109 target entities. TREC teams were provided with all of the ground truth data divided into training and evaluation data. We present peak macro-averaged F_1 scores for all run submissions. High scoring systems used a variety of approaches, including feature engineering around linguistic structures, names of related entities, and various types of classifiers. Top scoring systems achieved F_1 scores in the high-50s. We present results for a baseline system that performs in the low-40s. We discuss key lessons learned that motivate future tracks at the end of the paper. Categories & Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Information Filtering; H.3.m [Information Storage and Retrieval]: Miscellaneous – Test Collections; I.2.7 [Natural Language Processing] Text analysis – Language parsing and understanding General Terms: Experimentation, Measurement Introduction This overview paper describes the progression of the KBA evaluation over the three years from 2012, 2013, and 2014. TREC KBA is a stream filtering task focused on entity-level events in large volumes of data. Many large knowledge bases, such as Wikipedia, are maintained by small workforces of humans who cannot manually monitor all relevant content streams. As a result, most entity profiles lag far behind current events. KBA aims to help these scarce human resources by driving research on automatic systems for filtering streams of text for new information about entities. We refer to
منابع مشابه
Evaluating Stream Filtering for Entity Profile
The Knowledge Base Acceleration (KBA) track ran in TREC 2012, 2013, and 2014 as an entitycentric filtering evaluation. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. Compared with the 2012 and 2013 evaluations, the 2014 evaluation introduced several refinements, including high-qual...
متن کاملEvaluating Stream Filtering for Entity Profile Updates for TREC 2013
The Knowledge Base Acceleration (KBA) track in TREC 2013 expanded the entity-centric filtering evaluation from TREC KBA 2012. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. We doubled the size of the KBA streamcorpus to twelve thousand contiguous hours and a billion documents from ...
متن کاملEvaluating Stream Filtering for Entity
The Knowledge Base Acceleration (KBA) track in TREC 2013 expanded the entity-centric filtering evaluation from TREC KBA 2012. This track evaluates systems that filter a time-ordered corpus for documents and slot fills that would change an entity profile in a predefined list of entities. We doubled the size of the KBA streamcorpus to twelve thousand contiguous hours and a billion documents from ...
متن کاملA Related Entity based Approach for Knowledge Base Acceleration
In this paper we present the overview of our work in the TREC 2013 KBA Track. The goal is to find documents which may contribute to the update of knowledge base entries (e.g., Wikipedia or Freebase articles). Two tasks are introduced in this year’s track: (1) Cumulative Citation Recommendation (CCR), (2) Streaming Slot Filling (SSF). Particularly, we focus on the CCR task, follow our previous w...
متن کاملMSR KMG at TREC 2014 KBA Track Vital Filtering Task
In this paper, we present our strategy for TREC 2014 KBA track Vital Filtering task. This task is also known as "Cumulative Citation Recommendation" or "CCR" in 2012 and 2013. Vital Filtering task is to identify "vital" documents containing timely and new information that should be used to update the profile of a given entity (also called a topic). Our strategy for vital filtering is to first r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014